Content based Sentence Ordering using Spanning Tree Algorithm for Improved Multi Document Summarization
نویسنده
چکیده
Due to the availability of required information in the web, as multiple documents, the need for summarizing these multiple documents and ordering of the sentences in the summary in an efficient way become a relevant task in data mining. We present a novel sentence ordering method based on maximum cost spanning tree algorithm to improve the readability and cohesion of the summary obtained by extraction method from related multiple documents. It is based on extracting candidate sentences for the summary from multiple documents by ranking the sentences using cosine similarity measure and reducing the redundancy in the summary by Maximal Marginal Relevance (MMR) technique. Sentences in the summary are organized by constructing a graph where each sentence represents nodes of graph and edges are maintained between every pair of vertices which represents the similarity between the sentences. Most important task of our work is to find the first sentence to be placed in the ordered summary, by identifying the sentence which has minimum similarity with the sentences in the extracted summary. Ordering of remaining sentences in the summary is fixed one by one using Prim‟s Maximum Cost Spanning tree algorithm. The proposed algorithm is tested with DUC 2002 data set and found that summary generated after ordering has better readability and cohesion than that generated without ordering. It is noted that results are more impressive as the summary size increases. General Terms Data Mining
منابع مشابه
Content based Sentence Ordering using Spanning Tree Algorithm for Improved Multi Document Summarization
Due to the availability of required information in the web, as multiple documents, the need for summarizing these multiple documents and ordering of the sentences in the summary in an efficient way become a relevant task in data mining. We present a novel sentence ordering method based on maximum cost spanning tree algorithm to improve the readability and cohesion of the summary obtained by ext...
متن کاملA preference learning approach to sentence ordering for multi-document summarization
Ordering information is a difficult but an important task for applications generating naturallanguage texts such as multi-document summarization, question answering, and conceptto-text generation. In multi-document summarization, information is selected from a set of source documents. Therefore, the optimal ordering of those selected pieces of information to create a coherent summary is not obv...
متن کاملSentence ordering with manifold-based classification in multi-document summarization
In this paper, we propose a sentence ordering algorithm using a semi-supervised sentence classification and historical ordering strategy. The classification is based on the manifold structure underlying sentences, addressing the problem of limited labeled data. The historical ordering helps to ensure topic continuity and avoid topic bias. Experiments demonstrate that the method is effective.
متن کاملSignificance of Sentence Ordering in Multi Document Summarization
Multi-document summarization represents the information in a concise and comprehensive manner. In this paper we discuss the significance of ordering of sentences in multi document summarization. We show experimental results on DUC2002 dataset. These results show the ordering of summaries before and, improvement in this, after applying sentence ordering. For this purpose we used a term frequency...
متن کاملSentence Clustering-based Summarization of Multiple Text Documents
With the rapid growth of the World Wide Web, information overload is becoming a problem for an increasingly large number of people. Automatic Multidocument summarization can be an indispensable solution to reduce the information overload problem on the web. This kind of summarization facility helps users to see at a glance what a collection is about and provides a new way of managing a vast hoa...
متن کامل